Exploring Pipeline Computers, Array Processors, and Multiprocessor Systems
Understanding the fundamental architectures that enable parallel processing in modern computing
Parallel computers are those systems that use parallel processing. The basic features of parallel computers are listed below:
Perform overlapped computations to exploit temporal parallelism
Use multiple synchronized arithmetic logic units to activate spatial parallelism
Achieve asynchronous parallelism through a set of interactive processors with shared resources
Executing multiple instructions in overlapping time periods (pipelining)
Executing multiple operations simultaneously across multiple processing units
Multiple processors working independently on different tasks with shared resources
The execution of an instruction on a digital computer involves four steps:
Fetching the instruction from main memory
Decoding the instruction to identify the operation to perform
Fetching operands if needed for the execution
Executing the decoded arithmetic/logic operation
In non-pipelined computers, these four steps must finish before the next instruction can start. However, in a pipelined computer, successive instructions are executed concurrently in an overlapped manner.
The instruction cycle is made up of multiple pipeline cycles. A pipeline cycle can be set to the delay of the slowest stage. Data flows from stage to stage on each cycle, triggered by a common pipeline clock. All stages operate synchronously under this clock. Interface latches between stages hold intermediate results.
One instruction takes four pipeline cycles to complete
Once the pipeline is full, output results emerge each cycle
Because of the overlapped instruction fetch/decode and execution, pipelines are well-suited for repeatedly performing the same operations. When the operation changes (e.g. from add to multiply), the pipeline must be drained and reconfigured, causing delays. Thus, pipelines are most attractive for vector processing with repeated operations.
Modern processors use deep pipelines (14-19 stages in Pentium 4) to achieve high clock speeds
Use shorter pipelines (8-13 stages) for better energy efficiency in mobile devices
An array processor is a synchronized parallel computer with multiple arithmetic logic units, referred to as processing elements (PEs). It can operate simultaneously in a lockstep fashion. By replicating ALUs, spatial parallelism can be achieved.
Scalar and vector instructions are directly implemented in the Control unit
Each PE has an ALU with registers and local memory
The PEs are interconnected by a data routing network
The interconnection pattern established for a specific computation is under program control. Vector instructions are broadcast to the PEs for distributed execution across different component operands fetched directly from local memory. The PEs are passive devices with instruction decoding capabilities.
Additionally, associative memory, which is content addressable, will be examined in the context of parallel processing. Array processors designed with associative memory are called associative processors.
Memory locations are accessed by their content rather than by address
Multiple memory locations can be searched simultaneously
Parallel algorithms on array processors will be provided for:
Efficient parallel computation of matrix products
Combining multiple sorted lists into one
Parallel sorting algorithms like bitonic sort
Fast Fourier Transform (FFT) algorithms
A famous massively parallel SIMD computer from the 1980s
Modern GPUs use SIMD principles for parallel processing
Digital signal processors often use array processing techniques
The goal of researching and developing multiprocessor systems is to enhance throughput, reliability, flexibility, and availability.
Increased processing capability by utilizing multiple processors
System can continue operating even if one processor fails
System can be reconfigured for different workloads
System resources are accessible when needed
The fundamental multiprocessor design has two or more processors with similar capabilities. All processors have access to the same memory modules, I/O channels, and peripherals. Most critically, the entire system must be controlled by a single integrated operating system that enables interaction between processors and their programs.
In addition to the shared memories and I/O devices, each processor has its own local memory and private devices. Processors can communicate through the shared memories or the interrupt network.
Multiprocessor hardware system organization is determined by the interconnection structure to be used between the memories and processors. The three different interconnections are:
Simplest interconnection where all processors and memory share a common bus
Allows multiple simultaneous connections between processors and memory modules
Memory modules have multiple ports for direct connection to processors
Common in servers and high-end workstations (e.g., Intel Xeon, AMD EPYC)
Multiple processor cores on a single chip (e.g., ARM big.LITTLE)
Large-scale multiprocessor systems for distributed computing
| Structure | Parallelism Type | Key Features | Best For |
|---|---|---|---|
| Pipeline Computers | Temporal | Overlapped instruction execution, synchronized stages | Vector processing, repetitive operations |
| Array Processors | Spatial | Multiple synchronized ALUs, lockstep operation | Data-parallel tasks, matrix operations |
| Multiprocessor Systems | Asynchronous | Interactive processors, shared resources | General-purpose computing, high availability |
Modern computer systems often combine elements from all three parallel structures. For example:
Use pipelining (temporal) with multiple cores (spatial) and shared cache (multiprocessor)
Combine array processing principles with pipelining and multiprocessor designs
Integrate all three structures at different levels of the architecture
Bio-inspired parallel architectures for AI and machine learning
Exploiting quantum parallelism for exponential speedup
Large-scale parallel computing across global networks
Exploit temporal parallelism through overlapped execution
Exploit spatial parallelism through multiple synchronized ALUs
Exploit asynchronous parallelism through interactive processors
Understanding these three fundamental parallel computer structures is essential for designing and implementing efficient computing systems. Each structure has its strengths and is suited for different types of applications. The future of computing lies in hybrid approaches that combine the best features of all three structures to meet the ever-increasing demands for computational power.